Correcting Bias in Statistical Tests for Network Classifier Evaluation

نویسندگان

  • Tao Wang
  • Jennifer Neville
  • Brian Gallagher
  • Tina Eliassi-Rad
چکیده

Abstract. It is di cult to directly apply conventional significance tests to compare the performance of network classification models because network data instances are not independent and identically distributed. Recent work [6] has shown that paired t-tests applied to overlapping network samples will result in unacceptably high levels (e.g., up to 50%) of Type I error (i.e., the tests lead to incorrect conclusions that models are di↵erent, when they are not). Thus, we need new strategies to accurately evaluate network classifiers. In this paper, we analyze the sources of bias (e.g. dependencies among network data instances) theoretically and propose analytical corrections to standard significance tests to reduce the Type I error rate to more acceptable levels, while maintaining reasonable levels of statistical power to detect true performance di↵erences. We validate the e↵ectiveness of the proposed corrections empirically on both synthetic and real networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

استفاده از شبکه عصبی مرکب (Committee Machine) نظارت شده جهت بهبود الگوریتم شبکه های عصبی در تخمین تراوایی مخازن نفتی

Reservoir permeability is a critical parameter for the evaluation of hydrocarbon reservoirs. There are a lot of well log data related with this parameter. In this study, permeability is predicted using them and a supervised committee machine neural network (SCMNN) which is combined of 30 estimators. All of data were divided in two low and high permeability populations using statistical study. E...

متن کامل

Generalization Capability of Homogeneous Voting Classifier Based on Partially Replicated Data

The generalization error is one of the most important features taken into account in performance evaluation and verification of any classifier. We propose a voting system based on homogenous base classifiers (HVC) which ensures a better generalization capability than any of its components. The principle of this idea consists in the differentiation of a learning data set for each base classifier...

متن کامل

A novel method for correcting scanline-observational bias of discontinuity orientation

Scanline observation is known to introduce an angular bias into the probability distribution of orientation in three-dimensional space. In this paper, numerical solutions expressing the functional relationship between the scanline-observational distribution (in one-dimensional space) and the inherent distribution (in three-dimensional space) are derived using probability theory and calculus und...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

Using Error-Correcting Output Codes with Model-Refinement to Boost Centroid Text Classifier

In this work, we investigate the use of error-correcting output codes (ECOC) for boosting centroid text classifier. The implementation framework is to decompose one multi-class problem into multiple binary problems and then learn the individual binary classification problems by centroid classifier. However, this kind of decomposition incurs considerable bias for centroid classifier, which resul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011